feat: Add DuckDB plugin by andreahlert · Pull Request #633 · flyteorg/flyte-sdk

andreahlert · 2026-02-08T16:14:42Z

Summary

Add a DuckDB connector plugin for running SQL queries against DuckDB as Flyte tasks
DuckDB is an embedded analytical database (like SQLite for OLAP) that runs locally and synchronously
Follows the same architecture and patterns as the Snowflake plugin

Features

In-memory and file-based database support via DuckDBConfig(database_path=...)
Parameterized SQL queries with typed inputs
Extension installation and loading (httpfs, json, spatial, etc.)
Query results returned as pandas DataFrames via temporary parquet files
Automatic cleanup of temporary result files on delete

Design

Unlike Snowflake/BigQuery, DuckDB runs locally with no remote service, so the connector pattern is adapted:

Queries execute synchronously in create() (wrapped in run_in_executor for async compat)
get() always returns SUCCEEDED since queries complete in create()
delete() cleans up temporary parquet result files
No credentials, polling, or dashboard links needed

Test plan

25 unit tests covering task creation, config serialization, SQL generation, connector create/get/delete, parameterized queries, extensions, and end-to-end flow
All tests pass (pytest plugins/duckdb/tests/ -v)
Linting passes (ruff check plugins/duckdb/)
Plugin structure matches Snowflake plugin layout

Add a DuckDB connector plugin following the same patterns as the Snowflake plugin. DuckDB is an embedded analytical database that runs queries locally and synchronously, so the connector executes queries in create() and get() always returns SUCCEEDED. Features: - In-memory and file-based database support - Parameterized SQL queries with typed inputs - Extension installation and loading (httpfs, json, etc.) - Query results returned as pandas DataFrames via temp parquet files - Automatic cleanup of temporary result files Signed-off-by: André Ahlert <[email protected]>

kumare3

I think you got it a little wrong

andreahlert · 2026-02-08T23:11:01Z

I think you got it a little wrong

Thanks for the review! You're right, I should have looked at the existing flytekit DuckDB plugin as reference instead of modeling it after Snowflake. I'll rework this to use TaskTemplate with execute() and add DataFrame input type support.

Drop the AsyncConnector pattern (connector.py, dataframe.py) which is designed for remote services. DuckDB runs locally and needs guaranteed memory isolation, so the plugin now subclasses TaskTemplate directly with an async execute() method, following the same pattern as ContainerTask. Changes: - Accept pd.DataFrame, pa.Table and flyte.io.DataFrame as inputs registered as virtual tables via con.register() - Support parameterized queries with ? and $N placeholders - Support multi-query execution (list of queries) - Support runtime queries via 'query' string input - Remove connector dependency and entry-point from pyproject.toml - 26 tests covering execution, DataFrame inputs, params, multi-query, runtime queries, extensions and serialization Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>

Required for DataFrame input registration and Arrow table output. Without these the CI test environment fails on import. Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>

- Guard against empty query list (query=[]) which would cause AttributeError on None.to_arrow_table() - Use startswith("insert") instead of "insert" in query to avoid false matches on column names like insert_date - Add tests for both edge cases Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>

DuckDB's con.execute() always returns a result object even for DDL statements, so the None guard was unreachable. Replace with a test confirming DDL queries return an empty DataFrame. Signed-off-by: Andre Ahlert <[email protected]> Signed-off-by: André Ahlert <[email protected]>

…rames Handle DataFrame inputs that have _raw_df populated (from wrap_df/from_df) by converting directly to Arrow Table instead of trying to open via URI. Signed-off-by: André Ahlert <[email protected]>

samhita-alla · 2026-04-06T06:16:48Z

i'll review the PR later this week. thanks for porting it over!

andreahlert force-pushed the add-duckdb-plugin branch from 47fa959 to 100a36c Compare February 8, 2026 16:18